-
Notifications
You must be signed in to change notification settings - Fork 117
feat: end-to-end OpenAI Image Generation support #1280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: end-to-end OpenAI Image Generation support #1280
Conversation
9405f3d to
46012bb
Compare
|
@nutanix-Hrushikesh some admin with this PR, the DCO and PR style needs to be fixed for this PR |
|
to make this complete I would recommend following the pattern in testopenai (add a cassette for the new request handed and record it) then record a new span json in testopeninference (this ensures the data we capture is actually per impl and not accidentally different). Both have README, but let me know if any of it is unclear. |
internal/extproc/server.go
Outdated
| } | ||
|
|
||
| // Debug details about the processor selection. | ||
| if s.logger != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ill remove some extra logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove this entire block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nutanix-Hrushikesh also
|
@nutanix-Hrushikesh do you want to make it more complete by following @codefromthecrypt comment?
|
390e4d0 to
7907692
Compare
I have recorded cassette and span, Please let me if this is expected |
7907692 to
f51d37f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting farther! A couple notes
internal/apischema/openai/openai.go
Outdated
| // ModelTextEmbedding3Small is the cheapest model usable with /embeddings. | ||
| ModelTextEmbedding3Small = "text-embedding-3-small" | ||
|
|
||
| // ModelDALLE2 is the DALL-E 2 model usable with /v1/images/generations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before the rationale for entry here was the cheapest model that can be used to produce test data. I would look up which if either or a different one is cheapest. Only add one and document like the others if it is the cheapest.
Similarly for when you make the cassette you will notice in the existing text-to-speach image-to-text etc requests that they use the cheapest possible request in terms of cost and size. You cam ask AI to help you figure out that. For example I used grok to figure out a cheap audio request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used smaller model gpt-image-1-mini,
Also there is intermittent issue in test, cassette file is not saved sometimes where server is closed before file is saved to disk. Added temporary fix to sleep 5 second before closing.
dc13e02 to
dee84d4
Compare
|
will review next week 🙏 sorry for the delay |
dee84d4 to
16df789
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @nutanix-Hrushikesh, i left a few minor comments. From now, could you avoid force pushing per CONTRIBUTING.md for easy review. Also make sure that you remove any debugging lines
.env.ollama
Outdated
| THINKING_MODEL=qwen3:1.7b | ||
| COMPLETION_MODEL=qwen2.5:0.5b | ||
| EMBEDDINGS_MODEL=all-minilm:33m | ||
| IMAGE_GENERATION_MODEL=dall-e-2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this Ollama?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this wont work, but there is no image gen model available with ollma
|
|
||
| // Extract image generation metadata for metrics (model may be absent in SDK response) | ||
| imageMetadata.ImageCount = len(resp.Data) | ||
| imageMetadata.Model = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like other endpoints, we should assume that the requested model == response model if the response lacks the model name. so could you apply the patch like this?
diff --git a/internal/extproc/translator/imagegeneration_openai_openai.go b/internal/extproc/translator/imagegeneration_openai_openai.go
index 9e5d74ba..d5ab5ff5 100644
--- a/internal/extproc/translator/imagegeneration_openai_openai.go
+++ b/internal/extproc/translator/imagegeneration_openai_openai.go
@@ -6,6 +6,7 @@
package translator
import (
+ "cmp"
"encoding/json"
"fmt"
"io"
@@ -32,11 +33,12 @@ type openAIToOpenAIImageGenerationTranslator struct {
// The path of the images generations endpoint to be used for the request. It is prefixed with the OpenAI path prefix.
path string
// span is the tracing span for this request, inherited from the router filter.
- span tracing.ImageGenerationSpan
+ span tracing.ImageGenerationSpan
+ requestModel internalapi.RequestModel
}
// RequestBody implements [ImageGenerationTranslator.RequestBody].
-func (o *openAIToOpenAIImageGenerationTranslator) RequestBody(original []byte, _ *openaisdk.ImageGenerateParams, forceBodyMutation bool) (
+func (o *openAIToOpenAIImageGenerationTranslator) RequestBody(original []byte, p *openaisdk.ImageGenerateParams, forceBodyMutation bool) (
headerMutation *extprocv3.HeaderMutation, bodyMutation *extprocv3.BodyMutation, err error,
) {
var newBody []byte
@@ -47,6 +49,7 @@ func (o *openAIToOpenAIImageGenerationTranslator) RequestBody(original []byte, _
return nil, nil, fmt.Errorf("failed to set model name: %w", err)
}
}
+ o.requestModel = cmp.Or(o.modelNameOverride, p.Model)
// Always set the path header to the images generations endpoint so that the request is routed correctly.
headerMutation = &extprocv3.HeaderMutation{
@@ -144,9 +147,9 @@ func (o *openAIToOpenAIImageGenerationTranslator) ResponseBody(_ map[string]stri
tokenUsage.TotalTokens = uint32(resp.Usage.TotalTokens) //nolint:gosec
}
- // Extract image generation metadata for metrics (model may be absent in SDK response)
+ // Extract image generation metadata for metrics.
imageMetadata.ImageCount = len(resp.Data)
- imageMetadata.Model = ""
+ imageMetadata.Model = o.requestModel // Model is not present in the response, so we assume the request model == response model.
imageMetadata.Size = string(resp.Size)
return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nutanix-Hrushikesh this is unresolved yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok overall almost LGTM. can you edit the documentation https://github.com/envoyproxy/ai-gateway/blob/main/site/docs/capabilities/llm-integrations/supported-endpoints.md ?
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1280 +/- ##
==========================================
+ Coverage 78.27% 78.60% +0.32%
==========================================
Files 132 139 +7
Lines 13349 13745 +396
==========================================
+ Hits 10449 10804 +355
- Misses 2260 2287 +27
- Partials 640 654 +14 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
82361c0 to
fa27f96
Compare
|
@nutanix-Hrushikesh i see some bad merge here, could you clean up the commit here? |
| # completion is the standard OpenAI client (`openai` in pip), instrumented | ||
| # with the following OpenTelemetry instrumentation libraries: | ||
| # - openinference-instrumentation-openai (completions spans) | ||
| # - opentelemetry-instrumentation-httpx (HTTP client spans and trace headers) | ||
| completion: | ||
| build: | ||
| context: ../../tests/internal/testopeninference | ||
| dockerfile: Dockerfile.openai_client | ||
| target: completion | ||
| container_name: completion | ||
| profiles: ["test"] | ||
| env_file: | ||
| - ../../.env.ollama | ||
| - .env.otel.${COMPOSE_PROFILES:-console} | ||
| environment: | ||
| - OPENAI_BASE_URL=http://aigw:1975/v1 | ||
| - OPENAI_API_KEY=unused |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not delete irrelevant thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you not lie ?
ai-gateway/cmd/aigw/docker-compose-otel.yaml
Lines 126 to 142 in 8ee7ae1
| # completion is the standard OpenAI client (`openai` in pip), instrumented | |
| # with the following OpenTelemetry instrumentation libraries: | |
| # - openinference-instrumentation-openai (completions spans) | |
| # - opentelemetry-instrumentation-httpx (HTTP client spans and trace headers) | |
| completion: | |
| build: | |
| context: ../../tests/internal/testopeninference | |
| dockerfile: Dockerfile.openai_client | |
| target: completion | |
| container_name: completion | |
| profiles: ["test"] | |
| env_file: | |
| - ../../.env.ollama | |
| - .env.otel.${COMPOSE_PROFILES:-console} | |
| environment: | |
| - OPENAI_BASE_URL=http://aigw:1975/v1 | |
| - OPENAI_API_KEY=unused |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my bad, i think i was referring old file
Signed-off-by: Hrushikesh Patil <[email protected]>
cmd/aigw/config_test.go
Outdated
| // Clear any existing env vars | ||
| t.Setenv("OPENAI_API_KEY", "") | ||
| t.Setenv("OPENAI_BASE_URL", "") | ||
| t.Setenv("AZURE_OPENAI_API_KEY", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this relevant to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests were failing locally, to fix that i added this, but ill remove.
cmd/aigw/main_test.go
Outdated
| { | ||
| name: "run no arg", | ||
| args: []string{"run"}, | ||
| env: map[string]string{"OPENAI_API_KEY": "", "AZURE_OPENAI_API_KEY": ""}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
Signed-off-by: Hrushikesh Patil <[email protected]>
tests/extproc/mcp/mcp_test.go
Outdated
| ctx, cancel := context.WithCancel(t.Context()) //nolint: govet | ||
| ctx, cancel := context.WithCancel(t.Context()) | ||
| defer cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
irrelevant
Signed-off-by: Hrushikesh Patil <[email protected]>
|
|
||
| // ImageGenerationError represents an error response from the OpenAI Images API. | ||
| // This schema matches OpenAI's documented error wire format. | ||
| type ImageGenerationError struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why you can't use openai.Error like other places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you revert the unrelated changes like grouping etc as well as bring back the reference url to otel?
The reason is that the "grouping comment" will be considered as a comment for the first entry and not for others. That I feels is confusing so i would rather not do that. Instead, if we really want, we should do the documentation comments on each of the constant instead of partial grouping as in the current state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost there!
Use a different AIGW because Ollama does not currently support the image generation model. This setup requires the OpenAI API key to be set in environment variables. This is a temporary workaround until Ollama adds image generation support. Signed-off-by: Hrushikesh Patil <[email protected]>
| imageInfo: mustRegisterHistogram( | ||
| meter, | ||
| "ai_gateway.image.generation", | ||
| metric.WithDescription("Image generation request marker with image-specific attributes"), | ||
| ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need this? can't you just use gen_ai.client.operation.duration or whatever existing well-defined otel metrics and adding additional attributes for images specific stuff? I don't think this additional custom metrics (not even documented in this PR) is necessary. can you remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I even think having them all in the attributes is really bad idea as "size" has the infinite cardinality "image count" as well.
// for metrics and observability.
type ImageGenerationMetadata struct {
// ImageCount is the number of images generated in the response.
ImageCount int
// Model is the AI model used for image generation.
Model string
// Size is the size/dimensions of the generated images.
Size string
}
so can you
- Remove this metrics.
- Remove ImageGenerationMetadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for size, there are only 7-8 sizes option.
image count have infinite cardinality, Should i keep size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes fine, but please remove this metrics anyways
…adme Signed-off-by: Hrushikesh Patil <[email protected]>
…penai error type Signed-off-by: Hrushikesh Patil <[email protected]>
Signed-off-by: Hrushikesh Patil <[email protected]>
Signed-off-by: Hrushikesh Patil <[email protected]>
Signed-off-by: Hrushikesh Patil <[email protected]>
Signed-off-by: Hrushikesh Patil <[email protected]>
Signed-off-by: Takeshi Yoneda <[email protected]>
Signed-off-by: Takeshi Yoneda <[email protected]>
|
@nutanix-Hrushikesh thank you for the multiple iterations. It's good to see finally this landing! |
|
**Description** This PR adds complete support for OpenAI’s image generation endpoint (/v1/images/generations) across the Envoy AI Gateway. It introduces a processor, translation layer, tracing and metrics instrumentation, Brotli decoding, example client/service updates, and repo hygiene improvements. --------- Signed-off-by: Hrushikesh Patil <[email protected]>
Description
This PR adds complete support for OpenAI’s image generation endpoint (/v1/images/generations) across the Envoy AI Gateway. It introduces a processor, translation layer, tracing and metrics instrumentation, Brotli decoding, example client/service updates, and repo hygiene improvements.
Changes
ExtProc (image generation)
imageGenerationProcessorFactoryand registered it in ExtProc main.imageGenerationProcessorRouterFilterandimageGenerationProcessorUpstreamFilter.imageGenerationMetricsand integrated with the processor to record image-specific telemetry.Translator (OpenAI to OpenAI)
ImageGenerationTranslatorinterface andImageGenerationMetadatato standardize request/response translation for images.Tracing
ImageGenerationTracerandImageGenerationRecorder.imageGenerationTracerandimageGenerationSpan:ImageGenerationRecorderfor router filter instrumentation and a Noop tracer.Metrics
ImageGenerationMetricswith methods to record lifecycle events, model/backend selection, token usage, and image generation stats.genaiAttributeImageCount,genaiAttributeImageModel,genaiAttributeImageSizegenaiOperationImageGeneration.Utilities
decodeContentIfNeededto supportbrotliencoding alongside gzip for modern upstreams.CLI, docs, and examples
cmd/aigw/docker-compose.yaml: addedimage-generationservice (curl-based client) modeled after chat/embeddings.cmd/aigw/README.md: added image generation usage (service and curl examples) and embeddings “create-embeddings” instructions; updated OTEL section to include image-generation.Tests and config
/v1/images/generationsroute intests/extproc/vcr/envoy.yaml.Repo hygiene
.gitignore: ignoretests/e2e-inference-extension/logs/to prevent accidental log check-ins.Bug Fixes / Improvements
ImageGenerationError.Tests
Dependencies / Migrations
github.com/andybalholm/brotli v1.2.0(Brotli decoding).github.com/xyproto/randomstring v1.0.5(utility for upcoming features/tests).github.com/openai/openai-go/v2leveraged for schema-safe decoding and tracing types.Notes for Reviewers
/v1/images/generations, including retry and header/body mutation behavior.image-generationandcreate-embeddingsservices to validate examples; confirm README clarity..gitignorepath excludes e2e inference extension logs as intended.